A High Performance FPGA-Based Accelerator for BLAS Library Implementation

نویسندگان

Sébastien Rousseaux

Damien Hubaux

Pierre Guisset

Jean-Didier Legat

چکیده

This paper describes the implementation and the performance analysis of a hardware accelerator for the BLAS library matrix multiplication operation. This accelerator is based on a dual-FPGA board and on an implementation BLAS software library making use of the FPGA-based hardware. In order to evaluate the performance of such a system, we implemented the matrix multiplication operation (BLAS “dgemm” function) using an optimized matrix multiplication FPGA design and we implemented the software “dgemm()” function to make use of the FPGA-based board in a completely transparent way for the user. In contrast with others works [2,5,6,10], the measured performance is based on the global runtime of the FPGA-accelerated “dgemm” function at software level, taking into account the data transfers between the host computer and the FPGA board, and the software preand post-processing. We show that using the developed FPGA-based BLAS accelerator it is possible to achieve 60% higher performance than a fully software implementation running on a high-end computer. Through a detailed analysis, this paper also shows that the most limiting factors are data transfers between the host computer memory and the FPGA board memory, and the data transfers between this memory and the FPGA itself.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FPGA accelerator for floating-point matrix multiplication

This study treats architecture and implementation of a FPGA accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering...

متن کامل

Field Programmable Gate Array Implementation of Active Control Laws for Multi-mode Vibration Damping

This paper investigate the possibility and effectiveness of multi-mode vibration control of a plate through real-time FPGA (Field Programmable Gate Array) implementation. This type of embedded system offers true parallel and high throughput computation abilities. The control object is an aluminum panel, clamped to a Perspex box’s upper side. Two types of control laws are studied. The first belo...

متن کامل

The Implementation of BLAS level 3 on the AP 1000 : Preliminary Report ∗

The Basic Linear Algebra Subprogram (BLAS) library is widely used in many supercomputing applications, and is used to implement more extensive linear algebra subroutine libraries, such as LINPACK and LAPACK. To take advantage of the high degree of parallelism of architectures such as the Fujitsu AP1000, BLAS level 3 routines (matrix-matrix operations) are proposed. This project is concerned wit...

متن کامل

Design and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)

Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...

متن کامل

BLASFEO: Basic linear algebra subroutines for embedded optimization

BLASFEO is a dense linear algebra library providing high-performance implementations of BLASand LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the computational performance is optimized for small to medium scale matrices, i.e., for sizes up to a few hundred. BLASFEO comes with three different impl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

A High Performance FPGA-Based Accelerator for BLAS Library Implementation

نویسندگان

چکیده

منابع مشابه

FPGA accelerator for floating-point matrix multiplication

Field Programmable Gate Array Implementation of Active Control Laws for Multi-mode Vibration Damping

The Implementation of BLAS level 3 on the AP 1000 : Preliminary Report ∗

Design and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)

BLASFEO: Basic linear algebra subroutines for embedded optimization

عنوان ژورنال:

اشتراک گذاری